Spark 写入数据到Hive分区表

您所在的位置：网站首页 › spark写入hive分区表 oom › Spark 写入数据到Hive分区表

Spark 写入数据到Hive分区表

2024-04-27 04:14| 来源: 网络整理| 查看: 265

0.登录hive数据库，这里采用beeline [secret ~]$ beeline beeline> ! connect jdbc:hive2://10.1.1.1:10000 Enter username for jdbc:hive2://10.1.1.1:10000: secret Enter password for jdbc:hive2://10.1.1.1:10000: ************* 0: jdbc:hive2://10.1.1.1:10000> show databases; 0: jdbc:hive2://10.1.1.1:10000> use db_iot; 0: jdbc:hive2://10.1.1.1:10000> show tables; 0: jdbc:hive2://10.1.1.1:10000> describe iotdata; 1.创建数据库

有两种方式，可以在hive客户端或beeline连接hive创建，也可以在spark中创建，以hive shell 中创建为例，spark中只需hiveContext.sql(command)即可。

create database if not exists db_iot; use db_iot; //删除数据库 //drop database if exists db_iot; 2.创建数据表

同数据库一样，两种方式创建数据表，以hive shell 中创建为例另：在spark中不创建直接saveAsTable写入表且指定分区列时，hive中可以查询表数据但查不到表的创建和修改信息，此时创建的表也不是分区表。

create table if not exists iotdata_test ( ip_port string,ip string,country string,province string,city string,services_update_time string,services_layer_transport_port string,services_device_type string) //以上为定义数据字段及类型 partitioned by (ns_date string)//定义分区 row format serde 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'//序列化 hiveContext中无法定义时是检查有没有jar包 stored as inputformat 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' outputformat 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'; //定义输入输出处理方式 //删除数据表 //drop table if exists db_iot; 3.DataFrame 写入分区表

写入表中的DataFrame字段顺序要和定义表中的字段顺序相同。

DataFrame.write.mode(SaveMode.Append).format("parquet").partitionBy("ns_date").insertInto("db_iot.iotdata_test")

到此大功告成，可以用以下命令查看：

show create table iotdata_test; show partitions iotdata_test; select * from iotdata_test limit 10;

另外: 数据表在HDFS的存储路径下还有一些.hive-staging_hive_2018……文件，每执行一次数据库操作就会生成一个文件，这种文件的路径需要在hive配置文件中改动，可以修改到其他路径下或定时清理。

【本文地址】

Spark 写入数据到Hive分区表

Spark 写入数据到Hive分区表

今日新闻

推荐新闻